knitr::opts_chunk$set(echo = TRUE)
library(doSNOW)
library(dplyr)
library(purrr)
library(readr)
library(readxl)
library(ggplot2)
library(tidyr)

Introduction

gradescCM is a toy R package developed by Federico Cortese and Alessandro Mascaro and available on GitHub. It includes functions to perform linear regression and basic loocv implementations to evaluate the linear fit. To install and load the package, the following code is sufficient

# install.packages('devtools')
library(devtools)
install_github('FedericoCortese/R4DScm')
library(gradescCM)

Gradient descent and steepest descent

This is an example of usage of the functions linear_gd_optim and linear_gd_optim2. The first function contains the analitycal expression for the gradient of the loss function, whereas the second makes use of the function grad from the package numDeriv. We expect the first function to be faster than the second one. Consider the following simulated dataset:

library(gradescCM)
set.seed(8675309)
n = 1000
x1 = rnorm(n)
x2 = rnorm(n)
y = 1 + .5*x1 + .2*x2 + rnorm(n)
X=cbind(x1,x2)

We set the initial values for the parameters' vector beta equal to

b_pre=c(0,0,0)
tol=1e-6
maxit=10^5

For the first function we have

linear_gd_optim(b_pre,X,y,tol=tol,maxit=maxit)

For the second function

linear_gd_optim2(b_pre,X,y,tol=tol,maxit=maxit)

Time consuming:

library(dplyr)
library(purrr)
library(readr)
library(readxl)
library(ggplot2)
library(tidyr)
show_bm=function(bm){
  print(bm)
  autoplot(bm)
}

bench::mark(
  check=F,
  first=linear_gd_optim(b_pre,X,y),
  second=linear_gd_optim2(b_pre,X,y)
)%>% show_bm()

The first version is clearly faster than the second one. Consider now the Steepest descent method

basic_sd(b_pre,X,y,tol=tol,maxit=maxit)

Let us compare the Gradient Descent and the Steepest Descent:

bench::mark(
  check=F,
  GD=linear_gd_optim(b_pre,X,y),
  SD=basic_sd(b_pre,X,y)
)%>% show_bm()

As it can be observed, the steepest descent algorithm is much faster than the gradient descent. The main reason is that the choice of the stepsize of the steepest descent algorithm (which depends on the Hessian) makes it perform less iterations.

Leave-one-out cross validation

This are two examples of usage of the functions cm_LOOCV and cm_pllLOOCV. Both functions allow a simple computation of the mean squared error of prediction arising from a linear regression (whose parameters are computed through a Steepest descent method) through LOOCV, a well known resampling method. The first function perform a sequential LOOCV, while the second one allows for some parallelization that may speed up computation.

As a first example, let's consider the usual dataset:

library(gradescCM)
set.seed(8675309)
n = 1000
x1 = rnorm(n)
x2 = rnorm(n)
y = 1 + .5*x1 + .2*x2 + rnorm(n)
X=cbind(x1,x2)

We set the initial values for the parameters' vector beta, together with the tolerance and maximum number of iterations:

b_pre=c(0,0,0)
tol=1e-3
maxit=10^3

For the first function we have

cm_LOOCV(y = y, X = X, b_pre = b_pre, tol = tol, maxit = maxit)

For the second function

cm_pllLOOCV(y = y, X = X, b_pre = b_pre, tol = tol, maxit = maxit)

We can compare their efficiency as usual

bench::mark(
  check=F,
  sequential=cm_LOOCV(y, X, b_pre = b_pre, tol = tol, maxit = maxit),
  parallel=cm_pllLOOCV(y, X, b_pre = b_pre, tol = tol, maxit = maxit)
)%>% show_bm()

As a second example let's consider the following slightly modified dataset, in which the number of observations is scaled up to $10^4$

set.seed(8675309)
n = 2*10^4
x1 = rnorm(n)
x2 = rnorm(n)
y = 1 + .5*x1 + .2*x2 + rnorm(n)
X=cbind(x1,x2)
options(warn=-1)

Again, we set the initial values for the parameters' vector beta, together with the tolerance and maximum number of iterations:

b_pre=c(0,0,0)
tol=1e-3
maxit=10^3

We have

cm_LOOCV(y = y, X = X, b_pre = b_pre, tol = tol, maxit = maxit)
cm_pllLOOCV(y = y, X = X, b_pre = b_pre, tol = tol, maxit = maxit)

And again we compare the efficiency of the two algorithms

bench::mark(
  check=F,
  sequential=cm_LOOCV(y, X, b_pre = b_pre, tol = tol, maxit = maxit),
  parallel=cm_pllLOOCV(y, X, b_pre = b_pre, tol = tol, maxit = maxit)
) %>% show_bm()

As it can be observed, in the first example, the sequential version of LOOCV is more efficient than the parallelized one and viceversa in the second one. Hence, parallelizing the algorithm is not always computationally convenient, as for small dataset the cost of setting up the parallelization may be higher than the gain in efficiency. Clearly the more computationally intensive the problem at hand is, the more the advantages of using parallelization.



FedericoCortese/R4DScm documentation built on July 12, 2021, 5:30 a.m.